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Data-to-Decisions  Systems 
Issues  -  Time  and  Volume 
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O  Defend  United  States 

-  1.  Containerized  Nuclear  Weapon* 

-  2.  Blackmail  ICBM* 

3.  LACM  off  barge 

O  Counterinsurgency 

-  4.  Unguided  Battlefield  Rocket 

-  5.  Insurgencies 

-  6.  lEDs 

7.  Small  fast  attack  craft 

Anti-Access  Environments 

-  8.  Quiet  submarines 

-  9.  MARV  (Intercept) 

10.  Mobile  long-range  SAMs 

11.  Co-orbital  ASAT 

O  Security  Capacity 

-  12.  Stability  Operations 


Decision  Latency  (s) 


O  Counter  WMD* 

13.  Loose  Nukes 


National  security  decision  systems  span  all  QDR  missions  with  a  focus  on 
finding  threats  in  a  specified  data  volume  with  limited  manpower  within  a 

specified  time  window 
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Data-to-Decisions  Systems 
Issues  -  Personnel 


Predator  Sensor 


Increasing  Resolution 
and  Coverage 


Analysts 


Number  of  Highly  Skilled 
and  Trained  Analysts 
Remains  Constant  or 
Decreases 


Analysts 


National  security  decision  systems  span  all  QDR  missions  with  a  focus  on 
finding  threats  in  a  specified  data  volume  with  limited  manpower  within  a 

specified  time  window 
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D2D  Technology  Assessment 


•  Moderately  Mature 

•  Driven  by  IT  Industry 


•  Immature 

•  Driven  by  Defense 


•  Moderately  Mature 

•  Driven  by  IT  Industry 


Current  assessment  is  that  unstructured  data  analytics  is  the  most 
challenging  and  critical  component  of  D2D 


ASD  D2D  program  intends  to  provide  representative  data  of  various  types 
that  have  associated  ground  truth  to  support  development  and  evaluation 
of  algorithms  and  systems  in  a  SOA  to  be  made  available 
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Challenge  Problem  and  Framework 

for  Analysis 


Illustrative  Challenge  Problem  Canonical  Decision  Support 

Detect,  Track,  and  Infer  Intent  of  Objects  in  an  Architecture 

Urban  Environment  with  All  Source  Data 
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User  Interface 
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Analytics 


□  Data  Management 


Architectural  Layers 


Operational  Issues 
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Human  Effectiveness 


Operations  Management 
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Text 
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Software  Infrastructure 


Hardware  Infrastructure  (Compute  and  Storage) 


Sensor 

Sensor 

Sensor 

Sensor 

(Spatial,  Spectral) 

(Temporal) 

(Factual) 

(CBRNE) 

•  Shear  number  of  detections,  tracklets,  track  associations 
overwhelm  the  limited  number  of  analysts 

•  Integration  of  tracks  from  disparate  modalities  is  manually 
intensive  and  time  consuming 

•  Developing  long  duration  tracks  to  support  social  network 
analysis  including  patterns  of  life 

•  Representation  of  unstructured  data  that  is  incomplete, 
imprecise,  uncertain,  and  contradictory  to  support  analysis, 
storage  and  retrieval 

•  Understanding  the  observed  data  in  the  context  of  multiple 
hypotheses  that  are  consistent  to  develop  indications  and 
warnings,  reduce  the  number  of  hypotheses,  and  to 
develop  new  hypotheses 


Meeting  mission  timelines  and  operating  with  large  sources  of 
unstructured  data  requires  that  the  analyst  in  the  loop  is  more  effective 
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Highest  Payoff  Capabilities  and 
Associated  Metrics 

•  Data  Management _ 

-  Representations:  Efficient  representation  of  structured  and  unstructured  data 
supporting  format  normalization,  mission-aware  computation  and  100  x  compression 
without  loss  of  fidelity  in  applications 

«  MOVINT  Analysis 

-  Automated  tools  that  support  lOOx  improvement  in  the  number  of  tracks  that  an 
analyst  manages 

•  Probability  of  correct  association  of  tracklets  and  tracks  >  0.98 

_ *  Time  to  achieve  track  association  by  automation  less  than  current  SOA _ 

•  IMINT  Analysis _ 

-  Automated  tools  that  support  lOOx  improvement  in  the  number  of  objects,  activities, 
and  events  that  an  analyst  can  manage 

•  Probability  of  correct  classification  of  objects,  activities,  and  events  >  0.98 

•  Time  to  develop  objects,  activities,  and  events  less  than  current  SOA 

•  Text  Analysis _ 

-  Automated  tools  that  improve  by  100  the  rate  at  which  information  is  extracted  from 
documents  in  any  language  with 

•  High  Probability  of  correct  extraction 

•  User  Interface _ 

-  Automated  tools  align  the  information  models  of  all  participants  in  the  distributed  man 
machine  enterprise  that  are  98%  accurate 
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Data  Management  Layer 


•  Problem  Statement:  Increasing  data  volumes  and  modalities 

have  diminished  our  ability  to  communicate,  store,  retrieve  and 

process  sources  within  mission-critical  timelines 

•  3-to-5  year  timeframe  objective 

-  Computational  infrastructure  to  support  capturing,  processing,  marking, 
retrieval,  and  management  of  millions  of  information  objects  per  second  over 
discovery  mission  data  requirements  (PB/TB,  long  latency) 

-  Network  architecture  with  embedded  information  management  on  existing 
networks  to  support  both  real-time  (MB/GB,  low  latency)  and  assisted 
(GB/TB,  medium  latency)  mission  data  requirements 

•  7-to-1 0  year  timeframe  objective 

-  Mission-aware  information  lifecycle  management  to  age  data  from  (typically) 
short-term  concrete  data  storage  to  longer-term  symbolic  associative 
representation  and  retrieval  based  upon  perceived  utility  and  cost 

-  Self-balancing  merged  storage  and  processing  architecture  to  support 
analytics  with  minimal  data  movement 

-  Synchronized  anticipatory  sensor  control  and  compute/storage  resource 
allocation  to  support  rapid  ingest  and  real-time  exploitation 
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Data  Management  Roadmap 


•  Data  Representation 

-  Format  normalization 

-  Storage  lifecycle  mgmt 

•  Data  Access 

-  Indexing  &  retrieval 

-  Manipulation 

-  Ease  of  use 
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Distributed  Product  synthesis 
from  distributed  stores 

Interactive  resource-aware 
content  tailoring 


•  Data/Knowledge  Search 

-  User/Task-Tailored  Methods 

-  Knowledge  Discovery  Focused 


•  Scalable  Computation 


Medium  Latency:  Low  Latency: 

10s  Interactive  Users  10s  Interactive  Users 

~10  GFLOPS/Request  ~10  GFLOPS/Request 


-  Architectures 


-  Multi-structured  computation 


Medium  Latency: 
1000s  Interactive  Users 
-100  GFLOPS/Request 


-  Distributed  processing 


•  Autonomous  Networks 

-  Mapping  Info  to  Missions 

-  Prediction  models 


Mission-based 

Annotation  Predicting 

Tasking-order  to  Resource 

Info  prediction  Shortfalls  Mission-aware 

Capacity 
Allocation 


-  Resource  optimization 
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Analytic  Layer 


•  Problem  Statement:  Existing  automation  tools  do  not  aid  users  in 

finding  today’s  complex  and  adaptable  threats  within  mission 

timelines 

•  3-to-5  year  timeframe  objective 

-  Robust  classification  to  accurately  detect,  geo-register,  classify,  and  identify 
surface  objects  despite  difficult  environments,  configurations  and  emplacements 

-  Robust  automation  tools  to  identify  relationships,  patterns  of  life  and  activities  of 
objects  on  the  ground 

-  Robust  tools  to  capture,  store  and  retrieve  HU  Ml  NT-based  information  to 
identify  and  leverage  popular  support  against  insurgents 

-  Domain-specific  tools  to  capture,  search,  mine  and  exploit  explicit  information 
on  insurgent  networks  from  unstructured  textual  data  sources 

•  7-to-1 0  year  timeframe  objective 

-  Robust  automation  tools  to  identify  relationships,  patterns  of  life  and  activities  of 
dismounts 

-  Robust  tools  to  search,  mine  and  exploit  open-source  data  to  identify  all  aspects 
of  insurgent  networks 
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MOVINT  Roadmap 


FY12 

FY13 

FY14 

FY15 

FY16 

FY17 

FY18 

FY19 

FY20 

FY21 

•  Context  Aware  Tracking 

-  Real-time  Context  Mapping 

-  Track  Performance  Model 

•  Multi-Source  Tracking 

-  Track  Fusion 

-  Track  through  Gaps 

-  Move-Stop-Move 

•  Performance  Based 

-  Data  Warehouse 

-  Automatic  Parameter  Tuning 

•  Advanced  Tracking 

-  Feature-Aided  Tracking 

-Graph  Theoretic  Approaches 

•  Behavior  Modeling 

-  Patterns  of  Life 

-  Activity  Recognition 

•  Data  Collections 

•  Demonstrations 

•  Milestones 


7-10  Tracks/hr 
5-10  Minutes 


30  Tracks/hr 
20  Minutes 


100  Tracks/hr 
20  Minutes 


100  Tracks/hr 
40  Minutes 


200  Tracks/hr 
100  Minutes 


750  Tracks/hr 
60  Minutes 


1000  Tracks/hr 
80  Minutes 


50  Tracks/hr 
Baseline 


750  Tracks/hr 
Baseline+25% 


1000  Tracks/hr 
Baseline+50% 


40%  Confidence 


60%  Confidence 


90%  Confidence 


0.001  False  Alarms/week 


Automated 

Parameter 

Tuning 


Patterns  of  Life 


Multi-Source 
Tracking 
Through  Gaps 


Gross  Patterns  Advanced  Fine  Patterns  Recog  of 
Of  Behavior  Feature  Of  Behavior  ActivityType 
Aided 
Tracking 
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IMINT  Roadmap 


•  Multi-Source  Detection 

-  Precision  geo-registration 

-  Multi-INT  change  detection 

-  Scalable  compression 

•  Geometric  Features 

-  3D  reconstruction 

-  Rapid  target  insertion 

-  Geometric  clustering 

•  Advanced  Learning 

-  Large  corpus  training 

-  Model-based  learning 

-  On-the-fly  adaptation 

•  Performance  Models 

-  Sensor/Algorithm  trade-off 

-  Confidence  reporting  algorithm 

-  Predictive  performance  estimation 

•  Accurate  Geo-location 

-  Dynamic  adaptive  sensor  models 

-  Disparate  geometry  and  phenomenology 
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^5X  increase  #  Objects 


FA  Reduction 
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Demo 


Performance  Driven  Sensing 
Demo 
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Textual  Data  Roadmap 


•  Data  Preparation 

-  Zoning  (Source-specific) 
-OCR 

-  ASR 

•  Efficient  Text  Mining 

-  Efficient  Text  Mining 

-  Doc/Corpus  Categorization 
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Zone  known, 
(TBD) 

F95  Zone  known/new,  F7 
(FUSE?) 

5 

Zone  web 
(TBD  Prog 

-scale,  known,  new/variant  doc  types,  F 
ram) 

90 

Cross-Lingual  OCR,  90% 

(TBD) 

Cross-Lingual  ASR,  95% 
TBD) 


>  Entity/Event  Consolidation 

-  Entity  Coreference,  Consol idatio 

-  Event  Coreference,  Consolidation 


•  Sentiment  Extraction 

-  Explicit 

-  Latent 


160docs/day  192docs/day 
(Exceed)  (TBD) 


320  docs/day 
(TBD) 


Abbreviations: 

E  =  Entity 
EV  =  Event 
L  =  Language 
ML  =  Multilingual 
R  =  Relation 


>  Portability  (Genre/Domain/L) 

-  Port  Entities  (E)  to  new  G/D/L 

-  Port  E,  Relations,  Events 


.9  F 
(TBD) 
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User  Interaction  Layer 


*  Problem  Statement:  Existing  interface  tools  do  not  detect  and 
proactively  respond  to  the  users  information  needs,  given  massive 
amounts  of  data  collected  from  sensor  and  open-source  assets. 

*  3-to-5  year  timeframe  objective 

-  Reactive  intelligent  interfaces 

•  Acquisition  of  massive  data,  including  continuous  learning  and  inference 

•  Automatic  identification  of  potential  (human)  collaborators 

•  User-specified  interface  reconfiguration 

-  Adaptable  displays  that  automatically  draw  human  attention  to  problem  areas 

-  Workflow  tools  that  guide  analysts  in  complex  problems 

*  7-to-1 0  year  timeframe  objective 

-  Proactive  intelligent  interfaces  and  inference  engines  that: 

•  Generate  and  update  rich  models  of  their  users  current  tasks,  beliefs  and  intentions. 

•  Socially-guided  machine  learning  to  support  level  2+  fusion 

•  Proactively  identify  task-relevant  data  based  on  current  estimates  of  users  beliefs,  and 
intentions  and  to  offer  suggestions  based  on  these  estimates. 

•  Communicate  with  users  in  the  most  natural  way  possible  (language,  when 
appropriate) 

-  Workflow  tools  that  capture  and  teach  analysts’  best  practices 
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User  Interaction  Roadmap 


•  Knowledge  Mgmt  Tools 

*  Mission  Level  Knowledge  Tagging 
9  Normalization  of  Ontologies 


FY12 

FY13 

FY14 

FY15 

FY16 

FY17 

FY18 

FY19 

FY20 

FY21 

Data  pedigree  &  history  ‘ 
>90%  accuracy 

unpacking’ 

9  Continuous  Learning  &  Inf. 

•  Large  Scale  Cont.  Learning 

•  Fast  Inference  in  Large  KB’s 

9  Collaboration  Tools 

•  Topic/Interest  Models 

9  Collaboration  Recommendation 


Manual  correlations  >90%  automated  correlations  from  automated  correlations  from 

relevance  LSA  >70%  relevance  LSA  >90%  relevance 


Automated  push-pull  of 
Manually  monitored  shared  searches  >  80o/o 

&  correlated  >90% 


Automated  P/P 
>  95%  relevance 


•  User-Specified  Interfaces 

9  User  Supervision  of  ML  Stub 

9  Learning  Human  Operator 


Automation  in  display  >90%  relevance,  ML 
Collaboration  mechanisms  >90%  relevanc 


9  Socially  Guided  ML 
9  Active  Transfer  Learning 
9  Interactive  ML 


Manual  correlations  >90% 
relevance _ 


automated  correlations  from 
LSA  >70%  relevance _ 


automated  correlations  from 
LSA  >90%  relevance 


•  Rich  User  Models 
9  Socio-Cognitive  Architectures 
9  Natural  Language  Dialogue 


Visual  representation  of  people  &  tasks  >  70%  accuracy  >  90%  accuracy 


9  Best-Of-Breed  Strategy  Learning  Machine  advisors  >90%  accuracy  Machine  advisors  >  98%  accuracy 

9  Crowdsourcing  BoB  Strategies 
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FY12  Planned  BAA’s 


•  Data  to  Decision  Special  Notice,  Spring  2012 

-  POC:  Dr.  Carey  Schwartz 

•  ONR  Long  Range  BAA  BAA  12-01 

-  POC:  Dr.  Wen  Masters 

•  Research  Interests  of  AFOSR  BAA  201 0-1 

-  POC:  Dr.  Hugh  De  Long 

•  ARO  Core  BAA,  W91 1 NF-07-R-003-04 

-  POC:  Dr.  John  Lavery 

•  DARPA  120  Office  Wide  BAA,  1 1  -34 

-  POC:  Mr.  Daniel  Kaufman 

•  ONR  Computational  Intelligence  for  Rapid  Accurate 
Decision  Making  Special  Notice,  Spring  2012 

-  POC:  Dr.  Carey  Schwartz 
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Summary 


•  Data  representative  of  the  problem  domain  with  ground  truth  to  be 
made  available  for  development  and  testing  of  algorithms 

•  Specifications  of  a  Service  Oriented  Architecture  will  be  made 
available  to  abet  government  testing  and  evaluation 

•  Understand  the  relationship  between  the  “picture”  and  decisions 
based  upon  the  picture 

-  Bottoms  Up  to  identify  performance  controlling  functions/modules 

-  Top  Down  to  manage  quality  of  picture  and  manage  resources 

•  Symbiotic  Relationship  between  automation  and  humans 

-  Human  is  cognitive  within  the  architecture  and  not  a  servant  to  the  architecture 

-  Human  mentors  the  architecture  to  improve  performance 

•  Reduce  timelines  between  receipt  of  data,  what  does  it  mean,  and 
what  should  be  done  across  decision  support  systems 
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